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Abstract. We introduce a parameterized version of set cover that generalizes several previously studied problems. 
Given a ground set V and a collection of subsets Si of V, a feasible solution is a partition of V such that each subset 
of the partition is included in one of the Si . The problem involves maximizing the mean subset size of the partition, 
where the mean is the generalized mean of parameter p, taken over the elements. For p = — 1, the problem is 
equivalent to the classical minimum set cover problem. For p — 0, it is equivalent to the minimum entropy set cover 
problem, introduced by Halperin and Karp. For p — 1, the problem includes the maximum-edge clique partition 
problem as a special case. We prove that the greedy algorithm simultaneously approximates the problem within a 
factor of (p + 1) p for any p 6 K + , and that this is the best possible unless P = NP. These results both generalize 
and simplify previous results for special cases. We also consider the corresponding graph coloring problem, and 
prove several tractability and inapproximability results. Finally, we consider a further generalization of the set cover 
problem in which we aim at minimizing the sum of some concave function of the part sizes. As an application, we 
derive an approximation ratio for a Rent-or-Buy set cover problem. 

1 Introduction 

The greedy strategy is one of the simplest and most well-known heuristic, which can be applied to many 
combinatorial optimization problems. In the case of the minimum set cover problem, it involves iteratively 
choosing a subset that covers a maximum number of uncovered elements. We study this algorithm on a 
natural family of set covering problems in which the value of a subset depends on the number of elements it 
covers, and a parameter p encodes the way in which these values are combined. This parameter interpolates 
between different versions of the set covering problem, in particular between the classical minimum set 
cover problem, the minimum entropy set cover problem, and the simpler problem of finding a subset of 
maximum size. 

Intuitively, the greedy algorithm should perform better for objective functions in which more importance 
is given to subsets covering many elements. We give a formal support to this intuition by showing that the 
greedy algorithm provides a constant factor approximation for all positive values of the parameter p. We 
further show that this is the best we can achieve unless P = NP. 

We first define some notations. Let V be an n-element ground set and S = {Si, . . . , Sk} a collection 
of k subsets of V, whose union is V. In the minimum set cover problem, we seek a minimum size subset 
T C S such that Us^eT ^ = ^ ■ ^ e define a cover as an assignment tp : V i-> S of each element of V to a 
set of S such that v G ip(v) for all v € V. This definition allows us to define alternative objective functions 
for the set cover problem. Given a cover ip, let us define a part as a set p~ 1 (Si) for some Si € S. We use 
the following two notations: a := |(/3~ 1 (S , j)| is the part size of the ith subset Si with respect to ip, and 
a v := \<p~ l {p{v ))| is the size of the part containing the element v, with v € V. 

We define a new family of set cover problems in which we aim at maximizing the mean M({a v : v G 
V}) of the values a v . There exist many definitions of the mean M({ai, a,2, ■ ■ ■ , a n }) of a set of numbers. 



The most widely used definition is the arithmetic mean: Mi({a±, 02, ... , a n }) := - Ya=1 a «- Another 
well-known definition is the geometric mean: Mo({ai, 02, • • • , a n }) '■= («i • «2 • ■ • • ■ o n )». Finally, we also 
consider the harmonic mean: M_i({ai , 02, ... , a n }) := n/ (X^Li a j r1 )- The arithmetic, geometric, and 
harmonic means are special cases of the generalized mean: 





M p ({a 1 ,a 2 ,...,o n })= ( -^a?) = | - ^ cf^ | • (D 

This value is the arithmetic mean for p = 1, and the harmonic mean for p = — 1. It is well-known that 
the limit of the generalized mean for p — > is equal to the geometric mean. The generalized mean with 
parameter p is also called the normalized L p -norrrj^| 

Definition 1 (Maximum p-mean set cover). Given an n-element ground set V and a collection S = 
{S\ , . . . , Sk} of subsets ofV whose union is V, find a cover ip : V \—* S that maximizes M p {{a v : v £ V}), 
where a v := \<p (cp(v))\, and M p is the generalized mean of parameter p. 

Special Cases 

Interestingly, letting p = — 1 (harmonic mean) or p = (geometric mean) yields set cover problems that are 
already known: the harmonic mean version is the minimum set cover problem, while the geometric mean 
version is the minimum entropy set cover problem 0. A special case of the maximum p-mean set cover 
problem for p = 1 has recently been introduced in the form of a graph coloring problem O. 

Minimum Set Cover. The maximum harmonic mean set cover problem can be cast as min^ J2 V&V We 

can rewrite this objective function as YlveV ^ = J2s,eS Y^ve^fSi) IT = : Ci ^ °} I - Hence the 
maximum harmonic mean set cover problem is the standard minimum set cover problem. 

This problem is among the most studied NP-hard problems. It has long been known to be approximable 
within a factor H max .\ Si \ with the greedy algorithm. The first proof is from Johnson [20]. Lovasz |[23l 
obtained the same factor with a different method. Later, Chvatal extended the result to the weighted set cover 
problem [8], in which the subsets Si have nonuniform costs. A number of papers show that the logarithmic 
approximation guarantee is likely to be optimal. Lund and Yannakakis ll24l first proved that the problem 
is not approximable within logn/4 unless NP C DTIME(n polylog ( n )). This result has been improved to 
(1 - o(l)) Inn by Feige E3, under the hypothesis NP % DTIME(n°( 1 °s lo s ri )). Raz and Safra E2, and 
Alon, Moshkovitz, and Safra HI proved inapproximability results for factors of the form clnn for some 
constant c under the hypothesis P 7^ NP. These results are consequences of new PCP characterizations of 
NP. 

Minimum Entropy Set Cover. Let us now consider the geometric mean version: max^ (IXu e y a v) n ■ We 
relate this mean to the entropy of the discrete probability distribution found by dividing each part size by n: 

k 

- log - = - > - log — 
n n ^-^ n n 

i=l v£V 
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log n- log M ({a v : v E V}). 



We use the word p-mean here, in order to avoid confusion with the "minimum L p -norm set cover" problem [ 15 1. 
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Thus the maximum geometric mean set cover problem is equivalent to the problem of minimizing the en- 
tropy of the partition. This problem is known as the minimum entropy set cover problem. It has been intro- 
duced by Halperin and Karp |fl9l , and has applications in the field of computational biology. They proved 
that the problem was approximable within a constant additive term with the greedy algorithm. Improving on 
this work, Cardinal, Fiorini, and Joret @ provided a simple analysis showing that the constant was at most 
log 2 e ~ 1.4427 bits, and that this was the smallest additive error achievable in polynomial time, unless 
P = NP. The minimum entropy vertex cover Q and minimum entropy graph coloring Q problems, which 
are special cases of minimum entropy set cover, have been studied by the same authors. 

Maximum-Edge Clique Partition. In a recent publication ||9), Dessmark, Jansson, Lingas, Lundell, and 
Persson studied the maximum-edge clique partition (Max-ECP) problem. In this problem, we aim to partition 
a graph G into cliques in order to maximize the number of edges whose endpoints are in the same clique 
of the partition. This is an implicit set cover problem, in which the subsets Si are the cliques of G, and the 
function to maximize is: 



Thus the problem can be seen as an implicit maximum p-mean set cover problem for p = 1. They show that 
the problem is 2-approximable on perfect graphs using the greedy algorithm, and that it is not approximable 
within a factor n 1 ^ 1 /( logn ) 7 ) for some constant 7 in polynomial time unless NP C ZPTIME(2( logn ) 0<1) ). 

Max-Max and Max-Min Set Cover. When p — > 00, the maximum p-mean set cover problem involves finding 
a cover in which the largest part has maximum size. This is a trivial problem, unless the subsets in S are 
not given explicitly, like in the graph coloring problem. For p — > — 00, the problem is that of maximizing 
the size of the smallest part, thus solving max^ mm„ e y a v = max v mmj. c .^ (H- This problem seems much 
more challenging. We will refer to it as the max-min set cover problem. 



We show in section |2| that for any p £ M + , the maximum p-mean set cover problem is approximable within 
a factor of (p + l)^ p . This factor is less than e for all positive values of p, hence this can be seen as a 
robust e-approximation for all p-means with positive p. This result generalizes the approximability results 
of Cardinal et al. [6] for the case p — ► 0, and of Dessmark et al. (SO for p = 1. We also prove that this 
is the best we can achieve in polynomial time unless P = NP, using a powerful reduction due to Feige et 
al. II10I1 11 . When p is negative, we show that the performance of the greedy algorithm degrades. We give an 
inapproximability result for max-min set cover. 

Graph coloring problems can be seen as implicit set cover problems in which the subsets S% are the 
maximal independent sets of the graph. The subsets are not given explicitly, which would cause an expo- 
nential blowup in the problem size, but rather implicitly, from the graph structure. We define the maximum 
p-mean graph coloring problem in this natural way. Special cases of the maximum p-mean graph coloring 
problem include the standard minimum coloring problem (p = —1), the minimum entropy coloring prob- 
lem O (p — > 0), the maximum-edge clique partition problem @ (p = 1), and the maximum independent set 
problem (p — ► +00). In Section[3]we give approximability and inapproximability results for this problem. 

The maximum p-mean set cover problem involves maximizing the sum of the (p + l)th powers of 
the part sizes, as can be seen in equation ([TJ). In section [4j we consider weighted instances, and a further 




Our results 
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generalization of the set cover problem, in which we aim at minimizing the sum of some concave function 
of the part sizes. We give a closed form of the approximation ratio achieved by the greedy algorithm for this 
general class of problems, and apply this result to the case of the Rent-or-Buy set cover problem |[T2"Tl . 

Related works 

Minimum sum set cover. In the minimum sum set cover problem we aim to find an ordering of the subsets 
that minimizes the average cover time of an element of the ground set, where the cover time of an element is 
the index of the first subset covering it. This problem was first considered in its graph coloring version H. 
Feige, Lovasz, and Tetali ifTTTl gave an elegant proof of the fact that greedy is a 4-approximation algorithm, 
and that this was the best one could hope for unless P = NP. They also studied the related minimum sum 
vertex cover problem, for which they provided a 2-approximation algorithm. 

Generalizations of minimum sum set cover. Munagala, Babu, Motwani, and Widom ll25ll introduced the 
pipelined set cover problem. In this problem, we aim to find an ordering of the subsets in S that minimizes 
the Lp-norm of the vector (Ri), where Ri is the number of elements that are not contained in any of the 
first (i — 1) subsets. For p = 1, the problem is equivalent to the minimum sum set cover problem. They 
generalize the technique of Feige et al. iTTTIl to prove a 4 p -approximation. 

More recently, Golovin, Gupta, Kumar, and Tangwongsan [15] considered another minimum L p -norm 
set cover problem. This variant involves finding an ordering of the subsets minimizing the L p -norm of the 
cover time vector. This problem is a simultaneous generalization of the minimum set cover problem and 
the minimum sum set cover problem. They prove that the greedy algorithm provides a 0{p) -approximate 
solution, and that this is the best possible, up to a constant factor, unless NP C DTIME(n°( loglogn )). 

Graph Coloring. The greedy algorithm for set cover translates to the MaxIS algorithm for graph coloring, in 
which a maximum independent set is iteratively chosen as new color class. This algorithm has in particular 
been analyzed for the minimum sum [4] and minimum entropy M5I61 graph coloring problems. 

Recently, Fukunaga, Halldorsson, and Nagamochi |[T3l initiated the study of a very general family of 
minimum cost graph coloring problems, similar to what we propose in section [4] They proved that any 
minimum cost graph coloring problem in this family is 4-approximable on weighted interval graphs, pro- 
vided that the cost function is both monotone and concave. The proposed algorithm iteratively removes a 
maximum i-colorable subgraph, where i is doubled at each iteration. 

In another recent contribution, Fukunaga, Halldorsson, and Nagamochi [12J introduced the Rent-or-Buy 
coloring problem in vertex-weighted graphs, in which the cost of a color class is the minimum between 1 
and the total weight of the class. This models situations in which each color class has to be paid for either by 
"buying" it for a fixed cost, or "renting" it for a price proportional to its size. They gave, among other results, 
a 2-approximation for this problem in perfect graphs. We consider the set cover version of this problem in 
section |U 

Clique Partitioning with Value -poly matroidal Costs. Gijswijt, Jost, and Queyranne lTT4l recently studied 
clique partitioning problems with value-polymatroidal cost functions. A function / over the subsets of V is 
said to be value-polymatroidal whenever /(0) = 0, / is non-decreasing and for every subsets S and T with 
f(S) > f(T), and every umV\(TU S), the inequality f(S + u) - f(S) < f(T + u)- f(T) holds. They 
define the cost of a clique partition as the sum of the cost of each clique. They prove, among other results, 
that this problem is solvable in polynomial time on interval graphs. 



4 



Minimum L p -norm problems. Azar, Epstein, Richter, and Woeginger studied approximation algorithms 
for a scheduling problem in which we aim to minimize the L p -norm of the part sizes. A similar problem has 
been studied by Azar and Taub Q, who proposed all-norm approximation algorithms. Although similar 
in spirit, the goal is different than ours, since we instead seek the most "nonuniform" distribution, with 
maximum L p -norm. 

A number of other problems with general cost functions have been studied, such as facility location iPTTI . 
Due to space constraints, we do not give more details here. 



2 Approximability 

Lemma 1. The maximum p-mean set cover problem for p E R is approximable in polynomial time within a 
factor of 



IV' 



(2) 



Proof. We consider an optimal cover povr, and a part Ci = <fop T {Si) in this cover, of size |Q| = q. We 
define a' v := |v9 _1 ((^(u))| for the cover ip returned by the greedy algorithm. 

We first suppose that p > 0, and give a lower bound on the value of the cover ip restricted to Cj. We do 
so by examining the elements of d in the order in which they are covered by the greedy algorithm, breaking 
ties arbitrarily. The first covered element v\ € Cj must belong to a part of size at least in ip, since C, 
can be chosen as a new part, and the greedy algorithm chooses the largest part. Hence a' > Ci. Similarly, 
the second element V2 of Cj that is covered by greedy must belong to a class of size at least a — 1. Hence 
a' V2 > Ci — 1. In general, for the kth element vj- covered by the greedy algorithm, a' > Ci — k + 1. Thus 
we have 

veCi j=\ 

Letting a v := I^qpt^opt^))!! the corresponding value for ipopj is J2 vt zd = c f +1 > hence we get the 
following upper bound 



This ratio is increasing with Cj, and holds for all the parts Ci of <popt- Letting a = n and taking the pth root 
gives the result. 

A similar reasoning holds for p < 0, with the direction of inequalities (|3]) and ([4]) reversed. □ 

The approximation ratios for various values of p and n are given in Fig.[T] We next give a constant upper 
bound on the approximation ratio in the case p > 0. We need the following lemma. 



Lemma 2. For p G R + and n G N, 

n 

.3 



P+i 



i=1 P +1 
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Fig. 1. Approximation ratios for the greedy algorithm. 



Proof. The inequality holds for p = 0. For p > 0, it can be checked graphically that approximating the sum 
by an integral yields a lower bound: 

£/ = £f > / ^ = ^!- 

3=1 3=0 J ° F 

□ 

Combining lemmas [T] and [2] proves the following theorem. Tightness can be proved using known tight 
examples for special cases (see for instance [6]). 

Theorem 1. The maximum p-mean set cover problem is approximable in polynomial time within a factor of 

(p + 1) p far p £ M + . This bound is asymptotically tight. 

i 

Note that lim p ^ +00 (p + l)p =1, hence in the case of p — > +oo, the approximation ratio is equal 

to 1. This formalizes the trivial observation that if our goal is to maximize the size of the largest part, 

i 

then the greedy algorithm returns an optimal solution. Also, limp^oCp + l) p = e, which proves that 
the greedy algorithm approximates the minimum entropy set cover within an additive term of log e 
bits. This was shown by Cardinal, Fiorini, and Joret [6]. Finally, for p = 1, the greedy algorithm returns 
a 2-approximation. A proof of this result was given by Dessmark, Jansson, Lingas, Lundell, and Persson [9]. 



We now turn to the case p < 0. We know that the greedy algorithm approximates the problem for 
p = — 1 within a logarithmic factor. The following result shows that the performance of greedy degrades 
dramatically as p becomes smaller. 

Theorem 2. The maximum p-mean set cover problem is approximable in polynomial time within a factor of 
1— - - 

n i ("((?) q for any real p = — q < — 1, where ((q) = ]Cj=i 3~ Q iJ ^ e Ri^mann zeta function. 
Proof. We consider expression ([2]) in lemma[T]and replace p by — q: 



n 



1-9 



1-A 



C(?)«- 



(5) 



6 



□ 



The bound is asymptotically tight if we replace n by maxj \ Si\. Note that we need q > 1, otherwise the 
Dirichlet series defining the zeta function does not converge. In particular, when q = 1 (and thus p = — 1), 
we have the harmonic series, which is the approximation ratio for the minimum set cover problem. 

An interesting special case is when p = —2. This means that the cost of a part of size q in the cover is 
1/cj. In that case, the approximation ratio of the greedy algorithm becomes 

n l -2Q{2)2 =W-. (6) 

We now show that the approximability result in theorem [T] for positive values of p is the best we can 
hope for, unless P = NP. We need the following lemma, which is a simple consequence of the convexity of 
the function f(x) = x p+ . 

Consider two sorted sequences c\ > C2 > ■ ■ ■ > cp. and c[ > c' 2 > . . . > c' k . We say that (a) dominates 
(4) if 

3 3 

£>>^Vie{l,2,...,A;}. (7) 

i=l i=\ 



Lemma 3. If(ci) dominates (c^), then for any p £ M + , 

E4 +1 >E(cd p+1 - (8) 

i=l i=l 

Theorem 3. It is NP-hard to approximate the maximum p-mean set cover problem within a factor less than 
(p + l)p for p £ R + . 

Proof. Feige, Lovasz, and Tetali [ 1 1] gave a procedure for transforming a 3SAT-6 formula into a set system 
(V, S) with the following properties: 

- each subset Si £ S has size n/t for a certain parameter t, 

- if the formula is satisfiable, then there exists an exact cover of V with t subsets, 

- if the formula is 5-satisfiable, that is, if at most a fraction 5 of the clauses can be satisfied, then every i 
subsets of S cover at most a fraction (1 — (1 — l/t) 1 ) — e of the elements of V, for i £ {1,2,..., at} 
and any choice of the constants e > and a > 0. 

Given a formula known to be either satisfiable or 5-satisfiable, the problem of distinguishing between the 

two is NP-hard ifTTTL Using the transformation above, we show that a polynomial algorithm with an approx- 

i 

imation ratio less than (p + 1) v for maximum p-mean set cover would solve this problem. 

If the formula is satisfiable, then V can be covered by exactly t disjoint sets of S. From Lemma [3] this 
is the optimal solution. The part sizes a in this solution satisfy 

i=l i=l x 7 
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We now suppose the formula is only <5-satisfiable. We consider the distribution in which the ith part 
covers a fraction 

If l^ 1 



(1 - (i - i/ty) - (1 - (i - l/tr 1 ) = \ (\ 



t 



of the elements of V, for i G {1,2,..., at}, and the remaining parts cover exactly a fraction j (l — |) a< 
each. We denote by r the number of remaining parts, so that the sum of the fractions equals 1. From Lemma[3] 
and the properties of the reduction, this distribution dominates all other achievable distributions. Therefore 
the following upper bound holds. 

k .. , , at / _ , _ x i\ P+ 1 

+ M7U-7) ! (io) 





-e~ a(p+1) . (11) 

We can approximate the sum by an integral : 

_ (p+j> 

i • dx = 

p + l 



at ra f 

<Te~^^ . cfc = (1 _ e-OH-D). (12) 



i=0 

1 



The value r is the number of parts of size | (l — j) a — \& a needed to cover a fraction 1 
E"l H 1 ~ j) % - e ^ a of 1116 elements - Thus r ~ *. and 



e -o(p+i) ~ jL e -«(p+i)_ 



Note that since the constant i can be assumed to be arbitrary large [11], the approximations above are 
arbitrarily accurate. Hence expression (111 can be made arbitrarily close to: 

' / ' r \ - e ~ a ^ +1 A + e -°(p+i) V (13) 



\p + 1 

Now by choosing a sufficiently large, the ratio between ( p"3j ) and Q can be made arbitrary close to p + 1. 
The gap between the p-means is obtained by taking the pth root. □ 

In the case p — > 0, the above inapproximability proof shows that the additive log e error term is best 
possible (unless P = NP) for the minimum entropy set cover problem. This was also shown previously by 
Cardinal, Fiorini, and Joret [6]. 



Although we do not have a precise inapproximability threshold for negative values of p, we can prove 
the following result for p — > — 00. That is the max-min set cover problem, in which we aim to maximize the 
size of the smallest part. 

Theorem 4. It is NP-hard to approximate the max-min set cover problem within any constant factor. 

Proof. The proof uses the same reduction as the proof of theorem [5] We consider set systems (V, S) con- 
structed from a 3SAT-6 formula, such that there exists an exact cover with t parts of size j if the formula 
is satisfiable, and every i subsets of S cover at most a fraction (1 — (1 — l/t) 1 ) — e of the elements, for 
i G {1,2,..., at}, if the formula is <5-satisfiable. But this means that in the latter case, at least at subsets 
are needed to cover V. This implies that there is a part of size at most ^. Since a can be chosen arbitrarily 
greater than any constant, the gap can be made arbitrarily large. □ 
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3 Graph Coloring 



We now define the graph coloring variant of the maximum p-mean set cover problem. 

Definition 2 (Maximum p-mean graph coloring). Given a simple, undirected graph G = (V, E), find 
an assignment ip : V \— ► N of colors to vertices such that adjacent vertices receive different colors, and 
M p ({a v : v £ V}) is maximized, where a v := |<y2 -1 (c/?(t>))| and M p is the generalized mean with parameter 
p. 

The greedy algorithms extends naturally to what is referred to as the MaxIS algorithm, in which a 
maximum independent set is iteratively removed from the graph. This procedure can run in polynomial time 
only if at each step we can find a maximum independent set in polynomial time. This is true for large families 
of graphs, such as perfect graphs lPT6l . and claw-free graphs |[26l . We thus have the following corollary of 
theorem [T] (the proof of tightness is omitted). 

Corollary 1. The maximum p-mean graph coloring problem restricted to perfect or claw-free graphs is 
approximable in polynomial time within a factor o/(p + 1) p forp 6 R + . This bound is asymptotically tight. 

It may happen that we only have an approximate algorithm for the maximum independent set problem. 
Then the following result applies. Proofs are given in appendix [A] 

Theorem 5. If the maximum independent set problem can be approximated within a factor p in polynomial 



time, then the maximum p-mean graph coloring problem is approximable within a factor of p{p + l) p in 
polynomial time. 

Corollary 2. The minimum entropy coloring problem /|5]/ is approximable in polynomial time within an 
additive error of log 2 ( A + 2) — 0.14226 on graphs with maximum degree A. 

In the max-min graph coloring problem, that is when p — > — oo, we aim to maximize the size of the 
smallest color class. Using a recent polynomial algorithm from Kierstead and Kostochka to construct equi- 
table A + 1-colorings ll22l . we have the following approximability result. 

Corollary 3. The max-min graph coloring problem can be approximated in polynomial time within a factor 
(l + O (—)) — ^ on graphs of order n, maximum degree A, and chromatic number X- 

The maximum independent set problem is the special case of minimum p-mean coloring in which p — » 
-(-oo. It is therefore not surprising that the general coloring problem is not well approximable for any positive 
value of p, as the following lemma shows. 

Lemma 4. If the maximum independent problem set cannot be approximated in polynomial time within 
n 1 ~ £ for some e = e(n), then the maximum p-mean graph coloring problem with p G M + cannot be 



approximated in polynomial time within n \ p ' . 

Proof. If the maximum independent set cannot be approximated within n 1_£ , then we can safely assume 
that this holds for graphs having an independent set of size a > n l ~ £ . In such a graph, we consider the 
coloring obtained with a n 1 ~ te -approximation algorithm for maximum p-mean coloring, for some constant 
t to be fixed later. 
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The optimal solution in this graph has value at least (a p+1 ) v . Thus the value A of the coloring satisfies 

A ^-n^- < 14 > 

We now consider the largest color class in this coloring, and denote its size by h. We then get the following 
upper bound on A: 



A < ( -h p+ L ) P = hn~p. (15) 



Putting this together, we obtain 



hnv > y ' > ± — — '— (16) 

" n l - t£ 



[t-l-l)e 



h>n\ "J . (17) 

Letting t = 2 + ~, we obtain an independent set of size at least n £ , which is a n 1_£ -approximation for the 
maximum independent set problem, a contradiction. □ 

Applying this lemma and using a result from Khot EH . we obtain the following. 

Theorem 6. The maximum p-mean graph coloring problem, for p £ M + , is not approximable in polynomial 
time within a factor n i-O(i/0°g«) 7 ) f or some constant 7 unless NP C ZPTIME{2^°^° W ). 

A similar result for p — > was proved by Cardinal et al. |5]. The special case p = 1 was proved by 
Dessmark et al. [91. 



We end our discussion of the graph coloring problems with the equivalent problem in the complement 
of the graph G, which we call the maximum p-mean clique partition problem. The Max-ECP problem 
corresponds to the special case p = 1. Gijswijt, Jost, and Queyranne Ifl4l provided a 0(n 3 ) dynamic 
programming algorithm for finding a partition of interval graphs in cliques that minimizes the sum of a 
value-polymatroidal cost. Unfortunately, our objective function do not fall in that class, since the equivalent 
minimization problem involves minimizing a concave decreasing cost function, and value-polymatroidal 
functions must be non-decreasing. However, the correctness of their dynamic programming solely relies on 
the fact that an optimal partition always contain a maximal clique. This is true in our case as well, at least 
for p > 0, and is a consequence of lemma [3] Thus the algorithm can be applied and we get the following 
results. 

Theorem 7. The maximum p-mean clique partition problem with p £ M + can be solved in 0(n 3 ) time on 
interval graphs. 

Corollary 4. The Max-ECP problem /[£]/ can be solved in 0(n 3 ) time on interval graphs. 



4 Further Generalizations 



Weighted variant. We first observe that theorems [T] and [3] also hold for a weighted version of the minimum 
p-mean set cover problem. In this problem, the elements of v have a weight w(v). The objective function is 
the same, except that a v is now defined as w((/? _1 (y5(v))). We can observe that the approximability proofs 
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above still hold using a simple reduction for integer weights. Given a weighted instance, we can transform 
it into an unweighted instance by replacing each element v 6 V by w(v) copies of it, each belonging to the 
same subsets as v. Then each copy of the duplicated elements must belong to the same part of the (greedy or 
optimal) solution. Otherwise, from lemma[3j some elements can be reassigned so that the p-mean increases. 
The argument extends to rational and, by continuity, real weights. 

General costs. Following the definition of Fukunaga, Halldorsson, and Nagamochi [ 13 ] for minimum cost 
colorings, we now consider a much more general family of set cover problems. In these problems, we aim to 
minimize a sum of some concave function /(cj) of the part sizes. The functions / are concave in the sense 
that they are discrete restrictions of concave functions / : M + ^ R. We also assume /(0) = 0. Setting 
/(cj) = — cf +1 , for instance, yields a problem similar to the maximum p-mean set cover problem, without 
the 1/p exponent. The definition of this new family is as follows. 

Definition 3 (Set cover with general costs). Given an n-element ground set V and a collection S = 
{Si, . . . , Sk} of subsets of V whose union is V, find a cover (p : V i— > S that minimizes Y^%=x f( c i)> 
where Ci := |(/?~ 1 (S'i)| and f is a concave function. 

Concavity implies that we seek a distribution of the part sizes that is as unbalanced as possible. In 
particular, the following generalization of lemma [3] holds. 

Lemma 5. Given two nonincreasing sequences (a) and (c0, such that (cj) dominates (c^), and a concave 
function f, we have Ya=i /( c < Ya=i f( c d- 

Although the approximation ratio obtained with the greedy algorithm depends on the function /, we can 
give a simple expression of it. 

Theorem 8. The set cover problem with general costs can be approximated in polynomial time within a 
factor of 

max I ——— l_Jl ; 1 < c < max I Sj 



/(c) 3 



Proof, (sketch) Given a solution a?, we associate to each element v £ V the cost " » where a v = 

\ip^ (ip{v))\ as before. The cost of this solution is the sum ^2 veV ^ ■ Using concavity, we can bound 
this sum in a greedy solution as in the proof of lemma [T] we show that the sum over the elements in a part 
of size c in the optimal solution is at most X^=i fU) 1 3- The ratio follows. □ 

Note that we retrieve the approximation ratio H n of minimum set cover by setting /(c) = life > 
and /(0) = 0. This result also encompasses our analyses of the approximability of minimum entropy and 
maximum p-mean set cover. 

We now give an application of this result to a new problem. In this problem, we suppose that the cost 
of assigning an element of V to a subset Si is 1 if Si covers a lot of elements, but is proportional to its size 
if the fraction of elements covered by Si is small. More precisely, if the fraction a/n of elements covered 
by Si is greater than some constant j3 < 1, then the incurred cost is ^-/[3. Otherwise, the cost is 1. Thus (5 
defines a breakpoint, above which it is less costly to "buy" the subset than "rent" it. Hence we define the 
Rent-or-Buy set cover problem as the set cover problem with the following cost function: 



/(c) 



c/{(3n) ifc</3n, 
1 otherwise. 
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This models situations in which for instance jobs are assigned to machines, and machines can be either 
bought or rented. The model was introduced recently by Fukunaga, Halldorsson, and Nagamochi as a graph 
coloring problem [12]. The original description of the Rent-or-Buy model was on a weighted graph, and 
the coloring problem was to find a coloring minimizing the sum of the values min{l, w(Ci)} over all color 
classes Q, where w(Ci) is the sum of the weights of the vertices in Cj. From our reduction of weighted 
instances described above, this is equivalent to our problem with (3 = . 

Corollary 5. The Rent-or-Buy set cover problem is approximable in polynomial time within a factor of 
1 -ln/3. 

Proof. We let t = j3n. Let us first suppose that c < t. Then we have 



Since the greedy algorithm can be implemented to run in polynomial time on perfect or claw-free graphs, 
we obtain the following result on the Rent-or-Buy graph coloring problem. 

Corollary 6. The Rent-or-Buy coloring problem is approximable in polynomial time within a factor ofl + 
ln w(V) on perfect or claw-free graphs. 

This improves on the 2-approximation algorithm [12J when the overall weight w(V) does not exceed e. 
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Otherwise, if c > i, we have 
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A Proof of theorem [5] and corollary [2] 

The proof is similar to that of lemma [T] We consider the approximate MaxIS algorithm in which a p- 
approximate maximum independent set is chosen at each step. We consider a class C{ in an optimal coloring, 
of size Cj. The first vertex v\ of Cj that is colored by the approximate MaxIS algorithm will be assigned a 
value a' v at least Cj / p, since there exists an independent set of size q in the current graph. By iterating this 
argument, we obtain that ^2 veC .( a 'v) P — p\ Y^jLi 3 V ■ I n tne optimal coloring, the value of this color class 
is c^ +1 . Hence the ratio is at most 

/ nP +i \p f nP +i \p 

For positive values of p, combining with lemma [2] yields an approximation factor of p{jp + 1) ? . 

We now prove the corollary for the minimum entropy set cover problem. Using a greedy algorithm 
for the maximum independent set, we have p = (Z\ + 2)/3[18]. This ratio is valid for each step of the 
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algorithm, as the maximum degree of the graph cannot increase. From ([2]), the error term for the minimum 
entropy problem is at most 

lim log 2 (=^(p + 1)M = log 2 {A + 2) + log 2 (e) - log 2 (3) < \og 2 {A + 2) - 0.14226. 
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